Search CORE

7 research outputs found

Several approaches for tweet topic classification in COSET - IberEval 2017

Author: Garcés Díaz-Munío Gonçal
Villar Lafuente Carlos
Publication venue: CEUR Workshop Proceedings
Publication date: 19/09/2017
Field of study

[Otros] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the ¿Most frequent baseline¿, but below the ¿Bag-of-words + SVM¿ baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the ¿TF-IDF+RF¿ baseline.Villar Lafuente, C.; Garcés Díaz-Munío, G. (2017). Several approaches for tweet topic classification in COSET - IberEval 2017. CEUR Workshop Proceedings. 36-42. http://hdl.handle.net/10251/166361S364

RiuNet

The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task

Author: Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Iranzo-Sánchez Javier
Juan Alfons
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 02/08/2019
Field of study

ACL materials are Copyright © 1963¿2021 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.[EN] This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ¿ English and German ¿ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon); the Government of Spain¿s research project Multisub, ref. RTI2018-094879-B-I00 (MCIU/AEI/FEDER, EU); and the Universitat Politecnica de València's PAID-01-17 R&D support programme.Iranzo-Sánchez, J.; Garcés Díaz-Munío, G.; Civera Saiz, J.; Juan, A. (2019). The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task. The Association for Computational Linguistics. 218-224. https://doi.org/10.18653/v1/W19-5320S21822

RiuNet

The MLLP-UPV German-English Machine Translation System for WMT18

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Iranzo-Sánchez Javier
Juan Alfons
Martínez-Villaronga Adrià
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

ACL materials are Copyright © 1963-2021 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.[EN] This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German¿English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture¿based neural machine translation systems. To train our system under ¿constrained¿ conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon); the Spanish government's TIN2015-68326-R (MINECO/FEDER) research project MORE, university collaboration grant programme 2017-2018, and faculty training scholarship FPU13/06241; the Generalitat Valenciana's predoctoral research scholarship ACIF/2017/055; as well as the Universitat Politecnica de València's PAID-01-17 R&D support programme.Iranzo-Sánchez, J.; Baquero-Arnal, P.; Garcés Díaz-Munío, G.; Martínez-Villaronga, A.; Civera Saiz, J.; Juan, A. (2018). The MLLP-UPV German-English Machine Translation System for WMT18. Association for Computational Linguistics (ACL). 418-424. https://doi.org/10.18653/v1/W18-6414S41842

Crossref

RiuNet

MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Iranzo-Sánchez Javier
Jorge-Cano Javier
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: MDPI AG
Publication date: 01/01/2022
Field of study

[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5Gon) and 952215 (TAILOR), and Erasmus+ Education programme under grant agreement no. 20-226-093604-SCH (EXPERT); the Government of Spain's grant RTI2018-094879-B-I00 (Multisub) funded by MCIN/AEI/10.13039/501100011033 & "ERDF A way of making Europe", and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana's research project Classroom Activity Recognition (ref. PROMETEO/2019/111), and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de Valencia's PAID-01-17 R&D support programme.Baquero-Arnal, P.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2022). MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension. Applied Sciences. 12(2):1-14. https://doi.org/10.3390/app1202080411412

Directory of Open Access Journals

RiuNet

Towards cross-lingual voice cloning in higher education

Author: Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Jiménez Manuel
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Turró Ribalta Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/10/2021
Field of study

[EN] The rapid progress of modern AI tools for automatic speech recognition and machine translation is leading to a progressive cost reduction to produce publishable subtitles for educational videos in multiple languages. Similarly, text-to-speech technology is experiencing large improvements in terms of quality, flexibility and capabilities. In particular, state-of-the-art systems are now capable of seamlessly dealing with multiple languages and speakers in an integrated manner, thus enabling lecturer¿s voice cloning in languages she/he might not even speak. This work is to report the experience gained on using such systems at the Universitat Politècnica de València (UPV), mainly as a guidance for other educational organizations willing to conduct similar studies. It builds on previous work on the UPV¿s main repository of educational videos, MediaUPV, to produce multilingual subtitles at scale and low cost. Here, a detailed account is given on how this work has been extended to also allow for massive machine dubbing of MediaUPV. This includes collecting 59 h of clean speech data from UPV¿s academic staff, and extending our production pipeline of subtitles with a state-of-the-art multilingual and multi-speaker text-to-speech system trained from the collected data. Our main result comes from an extensive, subjective evaluation of this system by lecturers contributing to data collection. In brief, it is shown that text-to-speech technology is not only mature enough for its application to MediaUPV, but also needed as soon as possible by students to improve its accessibility and bridge language barriers.We wish first to thank all UPV lecturers who made this study possi-ble. We are also very grateful for the funding support received by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon) , the Spanish government under grant RTI2018-094879-B-I00 (Multisub, MCIU/AEI/FEDER) , and the Universitat Politecnica de Valencia's, Spain PAID-01-17 R&D sup-port programme. Funding for open access charge: CRUE-Universitat Politecnica de ValenciaPérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Sanchis Navarro, JA.; Civera Saiz, J.; Jiménez, M.... (2021). Towards cross-lingual voice cloning in higher education. Engineering Applications of Artificial Intelligence. 105:1-9. https://doi.org/10.1016/j.engappai.2021.104413S1910

RiuNet

Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Iranzo-Sánchez Javier
Jorge-Cano Javier
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Roselló Nahuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: 'International Speech Communication Association'
Publication date: 03/09/2021
Field of study

[EN] We introduce Europarl-ASR, a large speech and text corpus of parliamentary debates including 1300 hours of transcribed speeches and 70 million tokens of text in English extracted from European Parliament sessions. The training set is labelled with the Parliament¿s non-fully-verbatim official transcripts, time-aligned. As verbatimness is critical for acoustic model training, we also provide automatically noise-filtered and automatically verbatimized transcripts of all speeches based on speech data filtering and verbatimization techniques. Additionally, 18 hours of transcribed speeches were manually verbatimized to build reliable speaker-dependent and speaker-independent development/test sets for streaming ASR benchmarking. The availability of manual non-verbatim and verbatim transcripts for dev/test speeches makes this corpus useful for the assessment of automatic filtering and verbatimization techniques. This paper describes the corpus and its creation, and provides off-line and streaming ASR baselines for both the speaker-dependent and speaker-independent tasks using the three training transcription sets. The corpus is publicly released under an open licence.[Otros] "Europarl-ASR: Un extens corpus parlamentari de referència per a reconeixement de la parla i filtratge/literalització de transcripcions": Presentem Europarl-ASR, un extens corpus de veu i text de debats parlamentaris amb 1300 hores d'intervencions transcrites i 70 milions de paraules de text en anglés extrets de sessions del Parlament Europeu. Les transcripcions oficials del Parlament Europeu, no literals, s'han sincronitzat per a tot el conjunt d'entrenament. Com que l'entrenament de models acústics requereix transcripcions com més literals millor, també s'han inclòs transcripcions filtrades i transcripcions literalitzades de totes les intervencions, basades en tècniques de filtratge i literalització automàtics. A més, s'han inclòs 18 hores de transcripcions literals revisades manualment per definir dos conjunts de validació i avaluació de referència per a reconeixement automàtic de la parla en temps real, amb oradors coneguts i amb oradors desconeguts. Pel fet de disposar de transcripcions literals i no literals, aquest corpus és també ideal per a l'anàlisi de tècniques de filtratge i de literalització. En aquest article, es descriu la creació del corpus i es proporcionen mesures de referència de reconeixement automàtic de la parla en temps real i en diferit, amb oradors coneguts i amb oradors desconeguts, usant els tres conjunts de transcripcions d'entrenament. El corpus es fa públic amb una llicència oberta.This work has received funding from the EU¿s H2020 research and innovation programme under grant agreements 761758 (X5gon) and 952215 (TAILOR); the Government of Spain¿s research project Multisub (RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU) and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana¿s research project Classroom Activity Recognition (PROMETEO/2019/111) and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de València¿s ` PAID-01-17 R&D support programme.Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Baquero-Arnal, P.; Roselló, N.... (2021). Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization. International Speech Communication Association (ISCA). 3695-3699. https://doi.org/10.21437/Interspeech.2021-19053695369

RiuNet

MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Iranzo-Sánchez Javier
Jorge-Cano Javier
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: 'International Speech Communication Association'
Publication date: 25/03/2021
Field of study

[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politecnica de València for the Albayzin-RTVE 2020 Speech-to-Text Challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid BLSTM-HMM ASR system using streaming one-pass decoding with a context window of 1.5 seconds and a linear combination of an n-gram, a LSTM, and a Transformer language model (LM). The acoustic model was trained on nearly 4,000 hours of speech data from different sources, using the MLLP's transLectures-UPV toolkit (TLK) and TensorFlow; whilst LMs were trained using SRILM (n-gram), CUED-RNNLM (LSTM) and Fairseq (Transformer), with up to 102G tokens. This system achieved 11.6% and 16.0% WER on the test-2018 and test-2020 sets, respectively. As it is streaming-enabled, it could be put into production environments for automatic captioning of live media streams, with a theoretical delay of 1.5 seconds. Along with the primary system, we also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t that, following the same configuration of the primary one, but using a smaller context window of 0.6 seconds and a Transformer LM, scored 12.3% and 16.9% WER points respectively on the same test sets, with a measured empirical latency of 0.81+-0.09 seconds (mean+-stdev). This is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative.[Otros] En aquest article, es descriuen els sistemes de reconeixement automàtic de la parla (RAP) creats pel grup d'investigació MLLP-VRAIN de la Universitat Politecnica de València per a la competició Albayzin-RTVE 2020 Speech-to-Text Challenge. El sistema primari (p-streaming_1500ms_nlt) és un sistema de RAP híbrid BLSTM-HMM amb descodificació en temps real en una passada amb una finestra de context d'1,5 segons i una combinació lineal de models de llenguatge (ML) d'n-grames, LSTM i Transformer. El model acústic s'ha entrenat amb vora 4000 hores de parla transcrita de diferents fonts, usant el transLectures-UPV toolkit (TLK) del grup MLLP i TensorFlow; mentre que els ML s'han entrenat amb SRILM (n-grames), CUED-RNNLM (LSTM) i Fairseq (Transformer), amb 102G paraules (tokens). Aquest sistema ha obtingut 11,6 % i 16,0 % de WER en els conjunts test-2018 i test-2020, respectivament. És un sistema amb capacitat de temps real, que pot desplegar-se en producció per a subtitulació automàtica de fluxos audiovisuals en directe, amb un retard teòric d'1,5 segons. A banda del sistema primari, s'han presentat tres sistemes contrastius. D'aquests, destaquem el sistema c2-streaming_600ms_t que, amb la mateixa configuració que el sistema primari, però amb una finestra de context més reduïda de 0,6 segons i un ML Transformer, ha obtingut 12,3 % i 16,9 % de WER, respectivament, sobre els mateixos conjunts, amb una latència empírica mesurada de 0,81+-0,09 segons (mitjana+-desv). És a dir, s'han obtingut latències punteres per a subtitulació automàtica en directe d'alta qualitat amb una degradació del WER petita, del 6 % relatiu.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement no. 761758 (X5Gon); the Government of Spain¿s research project Multisub (ref. RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU) and FPU scholarships FPU14/03981 and FPU18/04135; and the Generalitat Valenciana¿s research project Classroom Activity Recognition (ref. PROMETEO/2019/111) and predoctoral research scholarship ACIF/2017/055Jorge-Cano, J.; Giménez Pastor, A.; Baquero-Arnal, P.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2021). MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge. 118-122. https://doi.org/10.21437/IberSPEECH.2021-2511812

RiuNet